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In computer vision, image classification is one of the potential image processing tasks. 
Nowadays, fish classification is a wide considered issue within the areas of machine 
learning and image segmentation. Moreover, it has been extended to a variety of do- 
mains, such as marketing strategies. This paper presents an effective fish classification 
method based on convolutional neural networks (CNNs). The experiments were con- 
ducted on the new dataset of Bangladesh’s indigenous fish species with three kinds 
of splitting: 80-20%, 75-25%, and 70-30%. We provide a comprehensive comparison 
of several popular optimizers of CNN. In total, we perform a comparative analysis of 
5 different state-of-the-art gradient descent-based optimizers, namely adaptive delta 
(AdaDelta), stochastic gradient descent (SGD), adaptive momentum (Adam), adaptive 
max pooling (Adamax), Root mean square propagation (Rmsprop), for CNN. Over- 
all, the obtained experimental results show that Rmsprop, Adam, Adamax performed 
well compared to the other optimization techniques used, while AdaDelta and SGD 
performed the worst. Furthermore, the experimental results demonstrated that Adam 
optimizer attained the best results in performance measures for 70-30% and 80-20% 
splitting experiments, while the Rmsprop optimizer attained the best results in terms of 
performance measures of 70-25% splitting experiments. Finally, the proposed model 
is then compared with state-of-the-art deep CNNs models. Therefore, the proposed. 
model attained the best accuracy of 98.46% in enhancing the CNN ability in classifi- 
cation, among others. 
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1. INTRODUCTION 


In recent years, computer sciences and technology have played a key role in many areas, such as the 
internet of things [1], network security [2], object detection, scene classification [3], and remote sensing [4]. 
Scene classification plays a key role in daily life due to alteration in the scenes’ countenance and environment. 
Nowadays, fish classification (FC) is being a vital study for further aquaculture and conservation. FC is defined 
as the process of distinguishing and perceiving fish species and families depending on their attributes by using 
image processing. It determines and classifies the objective fish into species depending on the similarity with 
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the representative specimen image [5]. The recognition of fish species is widely considered a challenging 
research area due to difficulties such as distortion, noise, and segmentation error incorporated in the images 
[6]. The experts face some difficulties in identifying and classification fish due to many of fish categories [7]. 
Previous works have only focused on environments, notwithstanding the needing for FC, and recognition has 
been raised. Recent developments in machine learning algorithms are among the most widely used for FC 
[8]. Generally, fish identification can be categorized into two groups as follows [9]: i) classification through 
internal identification [10], [11] in which attributes such as the primary structural framework and length could 
be extracted, then a fish expert database was established, and the fish was identified with the help of an algorithm 
[12] and ii) classification through the identification of the exterior part of the fish [13], [14]. An increasing 
number of studies have found that the effective and basic utilized strategy is to take pictures of fish by photo 
capture devices. Consequently, a correlation can be made between the current pictures and books of fish 
identification and pictures that have been taken. Hence, different fishes can fall into comparing classifications 
[13]. There are several approaches used to classify fish species in the literature based on structural and textural 
patterns [15], [16]. Hsiao et al. [17] utilized a sparse representation combined with principal component 
analysis to fish-species classification and attained an accuracy of 81.8%. Alsmadi et al. [13] introduced a 
fish classification model that utilized the combination between extracted features and statistical measurements. 
Some works were carried out on fish classification by utilizing the backpropagation algorithm, support vector 
machines [18], [19]. Islam et al. [20] proposed a hybrid local binary pattern to classify indigenous fish in 
Bangladesh. They generated a new dataset named BDIndigenousFish2019, then used SVM with different 
kernel sizes for indigenous classification fish and attained an accuracy of 94.97%. More recently, deep learning 
(DL) is gaining much attention in image classification [9]. 

Rathi et al. [21] introduced a technique to classify 21 fish species based on deep learning and attained 
an accuracy of 96.29%. Khalifa et al. [8] introduced a deep learning model to classify aquarium fish species 
and attained an accuracy of 85.59%. Deep learning demonstrated remarkable FC results for large-scale training 
datasets of fish images [22]-[24]. Kratzert and Mader [25] introduced an automatic system that used an adapted 
VGG network for FC. Chhabra et al. [26] proposed a hybrid deep learning approach (HDL) for FC. Abinaya et 
al. [27] introduced FC technique that combined three trained deep learning networks based on naive bayesian 
fusion (DLN-NB). 

Fish classification issue is to distinguish and group a fish as per its species precisely. In the light of re- 
cent studies in FC, this paper proposes a new classification model based on CNN that classifies the indigenous 
fish dataset. Our model is trained by utilizing eight distinct types of indigenous fish types from Bangladesh. 
Therefore, our classification model’s success rate for the indigenous fish dataset with three different data split- 
ting attained is 98.47%, 97.24%, 97.70%, respectively. This paper has contributions in several aspects: 


- There is no study based on CNN in the literature that classifies the ”>BDIndigenousFish2019” dataset to the 
best of our knowledge. 


- We proposed a new classification model based on CNN to classify the BDIndigenousFish2019 dataset. 
- This study includes an analysis of 5 different state-of-the-art gradient descent-based optimizers. 
- This study includes a comparative result of the state-of-the-art methods with CNN. 


The rest of this paper is outlined as the following: section 2 reviews a brief description of gradient 
descent based optimizers; section 3 introduces some related deep convolutional neural networks; the materials 
and the proposed method introduced in section 4; section 5 presents the experimental results and analysis; 
section 6 provides the discussion of this paper, and section 7 concludes this paper. 


2. GRADIENT DESCENT BASED OPTIMIZERS 

Many factors play a critical role in the efficiency of the convolutional neural network, such as opti- 
mization, batches, epochs, learning rate, activation function, and network architecture [28]. Optimization algo- 
rithms require fewer resources, make the model converge faster, and can influence machine learning mainly by 
optimizing learning parameters to speed up the learning process and consume fewer resources. Deep learning 
often requires a lot of time and powerful computer resources to carry out the training process. It is also a 
major reason impeding the development of deep learning algorithms. Despite our ability to use multi-computer 
distributed training to accelerate a typical learning, the required computing resources have not been reduced. 
Therefore, to reduce the error rate during the training process in CNN-based techniques, many gradient descent- 
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based optimization algorithms were used [29], such as AdaDelta, SGD, Adam, Adamax, Rmsprop. The follow- 
ing subsections introduce a brief description of gradient descent-based optimization algorithms that are used in 
this study. 


2.1. Stochastic gradient descent (SGD) optimization algorithm 

The SGD process starts from a random point and moves in steady steps to reach the training moment, 
but this requires a large number of iterations due to randomness [30]. And the learning rate does not change 
during the training process. The following equation shows linear regression utilizing gradient descent: 


W = w —- nV Ei lw) (1) 


where F;(w) represents the estimated data, Æ denotes an error function. Therefore, the SGD algorithm com- 
putes the best w by minimizing E at the same time. Thus, the following equation shows the composition of 
regular gradient descent: 

W «+ nVE(w) (2) 


where the error objective is estimated by (3): 
E(w) =n) _ iFi(w) > VE(w) =n) > iVEi(w) (3) 


2.2. Adaptive delta (AdaDelta) optimization algorithm 

AdaDelta is developed to reduce aggressiveness [31], strictly decreasing the learning rate of adap- 
tive grading (AdaGrad). Unlike the AdaGrad optimization algorithm, which takes accumulating the previous 
squared gradients [32], the AdaDelta takes the accumulated past gradients to fixed window size. In other words, 
the AdaDelta algorithm enhances the sharp descent direction expressed by a negative gradient as (4): 


Vir = Ngt (4) 


of (xt) 


where g+ represents the gradient at the t+, iteration 5 Fae) 


, and 7 denotes a learning rate. 


2.3. Root mean square propagation (Rmsprop) optimization algorithm 
Rmsprop is a derivation from the adaptive grading algorithm [33]. It depends on dividing the learning 
rate of the weight by the current average of the modern gradient values of this weight and maintains the rate of 
learning for each transaction depends on it (i.e., the total learning rate in it is almost constant). Still, it calculates 
the gradient as the regression’s mean exponentially rather than the sum of its gradients. The algorithm has 
excellent performance on unstable problems. Therefore, the running average can be estimated by (5) and (6): 
E [9°], = 0.9E [97] 


aa 0G (5) 


TE i -g (6) 


(1—7)g? 1 +79 +E 


where Æ [9°] , represents the running average, y is the decay term, g+ represents the squared gradients moving 
average. € is a tiny number to forestall any division by zero, and 7 represents initial learning rate. 


2.4. Adaptive momentum (Adam) and Adamax optimization algorithms 

Adam optimization algorithm is an extension of the SGD algorithm and has recently been widely 
used in deep learning applications, particularly computer vision and natural language processing tasks [34]. 
Adam’s algorithm differs from the regression of the stochastic derivative (SGD) in that the SGD maintains a 
single learning parameter to update all weights. Adam can update the weights of the neural network repeatedly 
based on the training data. Moreover, Adam’s algorithm calculates the adaptive treatment learning rate based 
on the average value of the first moment, such as the Rmsprop algorithm, and fully uses the average value of 
the second moment of the gradient. Adam optimization algorithm can be estimated as (7) to (10): 


fi =G * fit-1 — (1-— 1) * g (7) 


St = G2 * St-1 — (1 = G2) * g7 (8) 
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ft 
Aw, = — 
Wt n aTe * Ot (9) 
Wii = w + Aw (10) 


where C1,C2 are hyperparameters. 7), gt, ft, S+, represent initial learning rate, a gradient at time t, an exponential 
average of gradient along w, and an exponential average of squares of gradient, respectively. € is a tiny number 
to forestall any division by zero. Adamax optimization algorithm is developed inspired by Adam algorithm; 
Adamax provides a simpler range for the maximum learning rate [35], as (11): 


Ut = max (C2 - ut—1, |gtl) (11) 


where u+ is the exponentially weighted infinity norm. 


3. DEEP CONVOLUTIONAL NEURAL NETWORKS 

Deep learning is an area of machine learning that utilizes hierarchical architectures to learn high-level 
data reflections in many applications [36]. Moreover, the data representation can be enhanced by increasing 
the number of layers [37]. The distinctive attributes, characteristics, and classifiers are trained simultaneously 
in deep learning. The initial layers, including convolution filters, non-linear transformation, and the pooling 
layers, are utilized for the feature extraction. Lastly, the fully connected layers carry out the classification. 
The most effective deep learning techniques, in which many layers are robustly trained and validated, are 
convolutional neural networks (CNNs). Three main layers consist of a standard CNN; convolutional layers, 
pooling layers, and fully connected layers. CNNs can be capable of extracting information when the datasets 
have wide variations regarding context and the objects present in the images based on their colour, structure, 
and characteristics of a surface [38]. There are some of the leading pre-trained deep convolutional neural 
network versions, such as AlexNet [39], VGGNet [40], and ResNet [41]. Therefore, the utilization and various 
application of pre-trained networks are growing. 


3.1. AlexNet 

AlexNet is designed by Alex Krizhevsky, and its one of the prominent deep CNN used in many 
applications. The AlexNet deep architecture consisted of 5 convolutional layers, 3 max-pooling layers, 3 fully 
connected layers, and a classifier layer as an output layer [39]. 


3.2. VGGNet 

In order to reduce the number of parameters in the layers and improve on training time, VGGNet was 
designed by Simonyan and Zisserman whereas, all the convolutional kernels are of size 3 x3. There are several 
variants of VGGNet, such as VGG16 and VGG19. The difference between VGG16 and VGG19 is the number 
of weight layers in the network. However, the drawbacks of VGGNet include time-consuming training and a 
large number of parameters [40]. 


3.3. ResNet 

The ResNet architecture was developed by [41]. It’s much more profound than VGGNet. There 
are multiple versions of ResNet, such as ResNet50, and ResNet101. The main contribution of ResNet is 
introducing a so-called “identity shortcut connection” that skips one or more layers [39]. 


4. MATERIALS AND METHODS 
4.1. Image dataset 

We trained our model on the BDIndigenousFish2019 (BD2019) dataset, which contains eight fish 
species from Bangladesh. The BD2019 fish dataset was first time used in [20] for a named approach HLBP. 
HLBP is the FC method using hybrid features with SVM classifier. Therefore, it is not fair to compare HLBP 
performance with DL-based methods. The BD2019 fish dataset contains 2610 images with eight categories. 
Figure 1 illustrates a sample image of each type. The sample species are shown in Figure 2. Images were 
resized to 224x224 as per model requirements. 
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Sing 


Sorputi Taki Tengra 


Figure 1. Sample images of BD2019 dataset 


Sing 


Byen 
Taki 


Sol 
Koi 


Sorputi 


Tengra Foli 


Figure 2. Distribution of BD2019 fish species 


4.2. The proposed model 


The architecture of our model of FC is introduced in Figure 3. There are some preferences for utilizing 


convolutional neural networks among conventional strategies. Weight sharing in convolutional layers will 
decrease the number of parameters and make it easier to detect various attributes, such as edges, corners. The 
utilization of a pooling layer will address the known issue of sensitivity between the output features map and 
the input features’ location, thereby providing invariance to changes in the extracted features’ position and 
location. The batch normalization layer is used to make a deep network training robust by reducing internal 
covariate shift and more stable. 


The proposed model consists of a series of steps: 


The first layer input carries the image of size 224x224x3 and moves into the first convolutional layer having 
32 feature maps. 


After passing through a non-linear activation function (ReLU) and batch normalization, passing through a 
max-pooling layer. Thereby, the image dimensions been 28 x28 x 128. 

The second convolutional layer carries the previous layer’s output as input with 64 feature maps. It is then 
moving into a non-linearity function (ReLU), batch normalization, and then a max-pooling layer, so the 
output is now reduced to 56x56 x64. 

The third convolutional layer has 128 feature maps, moving into a non-linear activation function (ReLU) 
and batch normalization, passing through the max-pooling layer. Thereby, the image dimensions been 
28 X28 x 128. 


The fourth convolutional layer has 256 feature maps, moving into a non-linear activation function (ReLU) 
and batch normalization, then moving into a max-pooling layer. Thereby, the image dimensions been 
14x14x256. It is worthwhile noting that for convolutional layers 1 to 4, the size of each kernel was 3x3 
with a stride of 1. As well as, the filter size of the max-pooling layers was 2x2 with a stride of 2. 
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- The fifth to seventh convolutional layers are connected back-to back. These convolutional layers used 512, 
265, 128 feature maps, respectively, followed by non-linearity function (ReLU), batch normalization, and 
then max-pooling layer. It is worthwhile noting that for convolutional layers 5 to 7, the size of each kernel 
was 5x5 with a stride of 1. As well as, the filter size of the max-pooling layers was 2x2 with a stride of 2. 


The eighth convolutional layer having 64 feature maps, and the size of each kernel is 7x7 with a stride of 
1. After moving into a nonlinear activation function (ReLU), the convolutional layer’s output is flattened 
through a fully connected layer with 576 feature maps. Then it is connected again to a fully connected layer 
with 128 units. 

Then, passing through the Dropout layer with 0.3 is connected again to a fully connected layer with 256 
units. The softmax layer is utilized for the output layer with eight units that conform to the number of classes 
in the dataset. 


Input Images 


Output 


Figure 3. The proposed Fish classification schematic, Conv2D: convolution layer, MP: MaxPooling layer, 
FC: Fully connected layer 


5. EXPERIMENTAL RESULTS AND ANALYSIS 
The experiments were conducted in python 3.7 on a computer with Intel Core i7-6700HQ CPU/2.60 
GHz/16G/GTX 960. The experiment has been conducted on three Kinds of training splits, 80-20%, 75-25%, 
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and 70-30%, with a comparative analysis of different optimization algorithms. Further, we provide an exper- 
iment with an augmentation approach. Table 1 illustrates the parameters setting of the proposed model. The 
softmax function is used at the last layer; other layers use the Relu activation. Since we used an imbalanced 
dataset, the classification accuracy may not be efficient, particularly when we have a multi-class classification 
task. Therefore, a confusion matrix for each kind of data splitting is computed that may yield more informa- 
tion, i.e., what a classification model gets right and the errors it makes. Thereby, the performance measures 
computed from the confusion matrix, including accuracy, sensitivity, and specificity [7]. 


- Sensitivity is defined as the ability to measure the proportion of positives that are correctly identified. It can 
be estimated as (12): 


TP 


where TP denotes the total number of correctly classified for the actual class, and FN denotes the total 
number of not-correctly classified for the actual class. 
- Specificity the ability to measure the proportion of negatives that are correctly identified. It can be estimated 
as (14): 
TN 


where TN denotes the total number of not correctly classified for the actual class and FP denotes the total 
number of the correctly classified for the not actual class. 


Table 1. Hyper-parameters for the proposed method, used during training and testing, 
rectified linear unit (ReLU) 


Parameters 
Activation function ReLU 
Softmax 

Learning rate le~4 
Epochs 50 
Steps per epoch 10 
Batch size 32 

Loss function Categorical cross entropy 


5.1. Experiments on 70-30% data splitting 

In this experiment, we divided the dataset for training 70% and testing 30%, which belonged to 
eight classes. The testing is used as validation data to validate our model; therefore, the final epoch result 
of the validation accuracy is used as test accuracy. Moreover, we performed the analysis on five optimizers. 
The average performance results were achieved on 50 epochs. For this experiment, the most successful opti- 
mizer is Adam which attained 98.47% testing accuracy. While, Adamax, Rmsprop optimizers were performed 
well, and the performances of Adamax, Rmsprop were 94.89%, 93.74%, respectively. Table 2 illustrates the 
evaluation metric on testing data for this experiment. The performances of these optimizers are shown in 
Figure 4(a) to Figure 4(d). We can observe that the performances of SGD and AdaDelta optimizers were very 
bad. The accuracy, sensitivity, and specificity rate on Five optimizers as shown in Table 2. From Table 2, it 
can be observed that the Adam optimizer has performed better as compared to other optimizers. Therefore, the 
confusion matrix of this experiment is given in Table 3. 


Table 2. Evaluation metric on testing data for 70-30% splitting 
Optimizers Accuracy % Sensitivity % Specificity % 


Adamax 94.89 93.24 95.38 
Adam 98.46 97.13 99.04 
Rmsprop 93.74 94.21 93.24 
AdaDelta 27.71 24.26 28.60 
SGD 52.61 49.24 54.26 
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Figure 4. Proposed model training and validation analysis using 70-30% splitting over 50 epochs 
(a) validation accuracy, (b) training accuracy, (c) validation loss, and (d) training loss 
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Table 3. Confusion matrix of the proposed model on the testing dataset concerning 70-30% splitting using 
Adam optimizer 


Predicted class 
Byen Foli Koi Sing Sol Sorputi Taki Tengra 


Actual Class 


Byen 148 0 1 0 0 1 0 0 
Foli 0 88 0 0 1 0 1 0 
Koi 1 0 113 0 0 0 0 0 
Sing 1 0 0 118 0 1 0 0 
Sol 0 1 0 0 34 0 1 0 

Sorputi 0 1 1 0 0 58 0 0 

Taki 0 1 0 1 0 0 115 0 

Tengra 0 0 0 1 0 0 0 95 


5.2. Experiments on 75-25% data splitting 

In this section, the same experiments and analyzes were done, except we divided the dataset for 
training 75% and testing 25%. The performance of this experiment through these optimizers is illustrated in 
Figure 5 (a) to Figure 5(d) (see in appendix). We can observe that the most successful optimizer is Rm- 
sprop which attained 97.24% testing accuracy. While, Adamax and Adam optimizers were performed well, 
and Adamax and Adam’s performances were 96.63%, 94.18%, respectively. The performances of SGD and 
AdaDelta optimizers were very bad. Table 4 illustrates the evaluation metric on testing data for this experi- 
ment. From Table 4, it can be observed that the Rmsprop optimizer has performed better as compared to other 
optimizers. Therefore, the confusion matrix of this experiment is given in Table 5. 


Table 4. Evaluation metric on testing data for 75-25% splitting. The accuracy, sensitivity, and specificity rate 
on five optimizers as shown in Figure 5 
Optimizers Accuracy % Sensitivity % Specificity % 


Adamax 96.63 95.33 97.17 
Adam 94.18 94.34 95.89 
Rmsprop 97.24 96.51 98.14 
AdaDelta 30.62 27.48 33.06 
SGD 49.04 48.52 50.78 


Table 5. Confusion matrix of the proposed model on the testing dataset concerning 75-25% splitting using 
Rmsprop optimizer 


Actual Class Predicted Class 
Byen Foli Koi Sing Sol Sorputi Taki Tengra 

Byen 121 0 1 0 0 1 1 1 
Foli 0 72 0 0 1 0 1 0 

Koi 0 0 92 1 0 1 0 1 
Sing 1 0 0 98 0 1 0 0 

Sol 0 1 0 0 29 0 0 0 
Sorputi 0 0 1 0 0 49 0 0 
Taki 0 1 1 1 0 0 94 0 
Tengra 1 1 0 0 0 0 0 78 


5.3. Experiments on 80-20% data splitting 

Another experiment has been done in this section in which we divided the dataset for training 80% 
and testing 20%. Thereby, the performance of this experiment through these optimizers is illustrated in 
Figure 6 (a) to Figure 6(d) (see in appendix). We can observe that the most successful optimizer is Adam 
which attained 97.70% testing accuracy. While, Adamax and Rmsprop optimizers were performed well, and 
the performances of Adamax and Rmsprop were 95.01%, 93.67%, respectively. The performances of SGD and 
AdaDelta optimizers were very bad. Table 6 illustrates the evaluation metric on testing data for this experi- 
ment. From Table 6, it can be observed that the Adam optimizer has performed better as compared to other 
optimizers. Therefore, the confusion matrix of this experiment is given in Table 7. 
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Table 6. Evaluation metric on testing data for 80-20% splitting. The accuracy, sensitivity, and specificity rate 
on five optimizers as shown in Figure 6 
Optimizers Accuracy % Sensitivity % Specificity % 


Adamax 95.01 93.78 96.09 
Adam 97.70 96.24 98.24 
Rmsprop 93.67 92.70 94.24 
AdaDelta 28.92 26.89 29.24 
SGD 51.91 49.89 52.41 


Table 7. Confusion matrix of the proposed model on the testing dataset concerning 80-20% splitting using 
Adam optimizer 


Aa class Predicted Class 
Byen Foli Koi Sing Sol Sorputi Taki Tengra 

Byen 97 1 0 1 0 0 0 1 

Foli 1 59 0 0 0 0 0 0 

Koi 1 0 74 0 1 0 0 0 

Sing 0 0 0 79 0 0 1 0 

Sol 0 0 0 0 23 0 0 1 
Sorputi 0 0 1 0 0 39 0 0 

Taki 0 0 0 0 1 1 76 0 
Tengra 0 0 0 0 0 0 1 63 


6. DISCUSSION 

The strategy of utilizing the CNN to classify fish species requires thousands or indeed tens of thou- 
sands of samples to perform duties for training. Therefore, the task of fish species image collection may be a 
hard demand to accomplish and to gather adequate information. In this paper, we overcome these issues that 
directly involved in decreasing the performance of the fish classification task and introduced the new model 
with high accuracy in terms of classification. Among the five optimizers, Adamax was the steadiest one than 
AdaDelta, and SGD, which have worse performance. Moreover, the Adam, Adamax, and Rmsprop optimizers 
can attain good accuracy at 20 epochs, while the SGD and AdaDelta optimizers could not be attained even 
after 50 epochs. From the above discussion, it is uncovered that our proposed CNN architecture with various 
optimization algorithms provides promising results for fish classification; thus, the importance of choosing the 
hyperparameters of the network. We found that our model performed very well without data augmentation. 
Thus, we compared our work with state-of-the-art deep CNNs models, including AlexNet, VGG-16, VGG-19, 
Resnet50, adaptive-VGG [25], HDL [26], and DLN-NB [28]. Table 8 (see in appendix) illustrates the com- 
parison of results based on deep CNNs models and the developed model. It is worthwhile noting that the deep 
CNNs models were trained from scratch, and their results were obtained after 100 iterations. 


7. CONCLUSION 

This paper introduced a fish classification model with three data splitting and comparative analyses of 
five optimizers used in our proposed CNN model. The comparison is made on the publicly available BDIndige- 
nousFish2019 dataset. The results showed that three optimizers performed consistently. The Adam optimizer 
performed better among these optimizers concerning 70-30% and 80-20% experiments. On the contrary, the 
Rmsprop performed better in the 75-25% experiment. Therefore, the findings reinforce the significance of 
choosing the hyperparameters of the network used for classification. This paper demonstrated that state-of-the- 
art results could be achieved for fish classification through deep CNNs. The experimental result came about 
embody that this strategy is productive and dependable among existing deep CNNs models. Further study could 
be to examine whether our model can be employed on the other classification tasks. It would be interesting to 
investigate if the results can be improved using other artificial intelligence methods such as generative adversar- 
ial networks GAN and different transfer learning methods. This makes the research results more reproducible 
and comparable. 
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APPENDIX 
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Figure 5. Proposed model training and validation analysis using 75-25% splitting over 50 epochs 
(a) validation accuracy, (b) training accuracy, (c) validation loss, and (d) training loss 
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Figure 6. Proposed model training and validation analysis using 80-20% splitting over 50 epochs 
(a) validation accuracy, (b) training accuracy, (c) validation loss, and (d) training loss 
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Table 8. Comparison of results based on deep CNNs models and the developed model 


Model Number of iterations Accuracy % Sensitivity % Specificity % 
AlexNet [39] 100 85.23 84.43 86.21 
VGGNet-16 [40] 100 78.56 77.12 79.23 
VGGNet-19 [40] 100 76.88 75.51 78.25 
ResNet50 [41] 100 87.50 85.12 86.20 
Adaptive-VGG [25] 100 91.55 90.37 92.17 
HDL [26] 100 90.76 90.53 91.20 
DLN-NB [27] 100 94.95 93.17 96.28 
80-20% 50 97.70 96.24 98.24 
Our Method 75-25% 50 97.24 96.51 98.14 
70-30% 50 98.46 97.13 99.04 
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